智能论文笔记

Application of quantum computing to a linear non-Gaussian acyclic model for novel medical knowledge discovery

Hideaki Kawaguchi

分类：人工智能 | 机器学习

2021-10-09

最近，随着医学的数字化，利用临床部位收集的现实医疗数据一直在吸引注意力。在本研究中，量子计算被应用于线性非高斯无循环模型，以发现单独从现实世界医疗数据的因果关系。具体而言，使用量子内核计算Directlingam，因果发现算法的独立测量，并验证了实际医疗数据的准确性。当使用量子内核（Qlindam）的DirectlingAm应用于现实世界的医疗数据时，确认了一个案例，其中当数据量很小时，可以正确估计因果结构，这是现有方法不可能。此外，Qlingam在使用IBMQ的实验中在实验中在实验中实现。建议Qlingam可能能够发现新的医学知识并为医学问题的解决方案提供贡献，即使只有少量数据都有。

translated by 谷歌翻译

MixupE: Understanding and Improving Mixup from Directional Derivative Perspective

Vikas Verma , Sarthak Mittal , Wai Hoh Tang , Hieu Pham , Juho Kannala , Yoshua Bengio , Arno Solin , Kenji Kawaguchi

分类：机器学习 | 计算机视觉

2022-12-27

Mixup is a popular data augmentation technique for training deep neural networks where additional samples are generated by linearly interpolating pairs of inputs and their labels. This technique is known to improve the generalization performance in many learning paradigms and applications. In this work, we first analyze Mixup and show that it implicitly regularizes infinitely many directional derivatives of all orders. We then propose a new method to improve Mixup based on the novel insight. To demonstrate the effectiveness of the proposed method, we conduct experiments across various domains such as images, tabular data, speech, and graphs. Our results show that the proposed method improves Mixup across various datasets using a variety of architectures, for instance, exhibiting an improvement over Mixup by 0.8% in ImageNet top-1 accuracy.

translated by 谷歌翻译

Fully 3D Implementation of the End-to-end Deep Image Prior-based PET Image Reconstruction Using Block Iterative Algorithm

Fumio Hashimoto , Yuya Onishi , Kibo Ote , Hideaki Tashima , Taiga Yamaya

分类：计算机视觉 | 机器学习

2022-12-22

Deep image prior (DIP) has recently attracted attention owing to its unsupervised positron emission tomography (PET) image reconstruction, which does not require any prior training dataset. In this paper, we present the first attempt to implement an end-to-end DIP-based fully 3D PET image reconstruction method that incorporates a forward-projection model into a loss function. To implement a practical fully 3D PET image reconstruction, which could not be performed due to a graphics processing unit memory limitation, we modify the DIP optimization to block-iteration and sequentially learn an ordered sequence of block sinograms. Furthermore, the relative difference penalty (RDP) term was added to the loss function to enhance the quantitative PET image accuracy. We evaluated our proposed method using Monte Carlo simulation with [$^{18}$F]FDG PET data of a human brain and a preclinical study on monkey brain [$^{18}$F]FDG PET data. The proposed method was compared with the maximum-likelihood expectation maximization (EM), maximum-a-posterior EM with RDP, and hybrid DIP-based PET reconstruction methods. The simulation results showed that the proposed method improved the PET image quality by reducing statistical noise and preserved a contrast of brain structures and inserted tumor compared with other algorithms. In the preclinical experiment, finer structures and better contrast recovery were obtained by the proposed method. This indicated that the proposed method can produce high-quality images without a prior training dataset. Thus, the proposed method is a key enabling technology for the straightforward and practical implementation of end-to-end DIP-based fully 3D PET image reconstruction.

translated by 谷歌翻译

Slimmable Pruned Neural Networks

Hideaki Kuratsu , Atsuyoshi Nakamura

分类：计算机视觉

2022-12-07

Slimmable Neural Networks (S-Net) is a novel network which enabled to select one of the predefined proportions of channels (sub-network) dynamically depending on the current computational resource availability. The accuracy of each sub-network on S-Net, however, is inferior to that of individually trained networks of the same size due to its difficulty of simultaneous optimization on different sub-networks. In this paper, we propose Slimmable Pruned Neural Networks (SP-Net), which has sub-network structures learned by pruning instead of adopting structures with the same proportion of channels in each layer (width multiplier) like S-Net, and we also propose new pruning procedures: multi-base pruning instead of one-shot or iterative pruning to realize high accuracy and huge training time saving. We also introduced slimmable channel sorting (scs) to achieve calculation as fast as S-Net and zero padding match (zpm) pruning to prune residual structure in more efficient way. SP-Net can be combined with any kind of channel pruning methods and does not require any complicated processing or time-consuming architecture search like NAS models. Compared with each sub-network of the same FLOPs on S-Net, SP-Net improves accuracy by 1.2-1.5% for ResNet-50, 0.9-4.4% for VGGNet, 1.3-2.7% for MobileNetV1, 1.4-3.1% for MobileNetV2 on ImageNet. Furthermore, our methods outperform other SOTA pruning methods and are on par with various NAS models according to our experimental results on ImageNet. The code is available at https://github.com/hideakikuratsu/SP-Net.

translated by 谷歌翻译

GFlowOut: Dropout with Generative Flow Networks

Dianbo Liu , Moksh Jain , Bonaventure Dossou , Qianli Shen , Salem Lahlou , Anirudh Goyal , Nikolay Malkin , Chris Emezue , Dinghuai Zhang , Nadhir Hassen

分类：机器学习 | 人工智能

2022-10-24

Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.

translated by 谷歌翻译

Critical Bach Size Minimizes Stochastic First-Order Oracle Complexity of Deep Learning Optimizer using Hyperparameters Close to One

Hideaki Iiduka

分类：机器学习

2022-08-21

实际结果表明，使用较小的恒定学习速率，接近一个的超参数的深度学习优化者，大批量大小可以找到最小化损失功能的深神经网络的模型参数。我们首先显示了理论上的证据，即动量方法（动量）和自适应力矩估计（ADAM）的表现很好，即理论表现度量的上限很小，恒定学习率很小，超级参数接近一个，并且是一个大的。批量大小。接下来，我们证明存在一个批处理大小，称为关键批次尺寸最小化随机的甲骨文（SFO）复杂性，这是随机梯度计算成本，一旦批次大小超过关键批次大小，SFO的复杂性就会增加。最后，我们提供了支持我们理论结果的数值结果。也就是说，数值结果表明，ADAM使用较小的恒定学习率，接近一个的超参数和最小化SFO复杂性的临界批次大小比动量和随机梯度下降（SGD）更快。

translated by 谷歌翻译

Deep Bayesian Active-Learning-to-Rank for Endoscopic Image Data

Takeaki Kadota , Hideaki Hayashi , Ryoma Bise , Kiyohito Tanaka , Seiichi Uchida

分类：计算机视觉

2022-08-05

自动基于图像的疾病严重程度估计通常使用离散（即量化）严重性标签。由于图像含糊不清，因此通常很难注释离散标签。一个更容易的替代方法是使用相对注释，该注释比较图像对之间的严重程度。通过使用带有相对注释的学习对框架，我们可以训练一个神经网络，该神经网络估计与严重程度相关的等级分数。但是，所有可能对的相对注释都是过敏的，因此，适当的样品对选择是强制性的。本文提出了深层贝叶斯的主动学习与级别，该级别训练贝叶斯卷积神经网络，同时自动选择合适的对进行相对注释。我们通过对溃疡性结肠炎的内窥镜图像进行实验证实了该方法的效率。此外，我们确认我们的方法即使在严重的类失衡中也很有用，因为它可以自动从次要类中选择样本。

translated by 谷歌翻译

Discrete Key-Value Bottleneck

Frederik Träuble , Anirudh Goyal , Nasim Rahaman , Michael Mozer , Kenji Kawaguchi , Yoshua Bengio , Bernhard Schölkopf

分类：机器学习 | 人工智能

2022-07-22

深度神经网络在数据流是I.I.D的规范环境中的预测和分类任务上表现良好，标记的数据很丰富，并且类标签平衡。随着分配变化的挑战，包括非平稳或不平衡数据流。解决了这一挑战的一种强大方法是在大量未标记的数据上对大型编码器进行自我监督的预处理，然后进行特定于任务的调整。鉴于一项新任务，更新这些编码器的权重是具有挑战性的，因为需要微调大量权重，因此，他们忘记了有关先前任务的信息。在目前的工作中，我们提出了一个模型体系结构来解决此问题，以一个离散的瓶颈为基础，其中包含成对的单独和可学习的（键，价值）代码。在此设置中，我们遵循编码；通过离散瓶颈处理表示形式；和解码范式，其中输入被馈送到预处理的编码器中，编码器的输出用于选择最近的键，并将相应的值馈送到解码器以求解当前任务。该模型只能在推理过程中获取和重复使用有限数量的这些（密钥，值）对，从而启用本地化和上下文依赖的模型更新。从理论上讲，我们研究了所提出的模型最小化分布的影响的能力，并表明与（键，值）配对的这种离散瓶颈降低了假设类别的复杂性。我们经验验证了提出的方法在各种基准数据集的挑战性分配转移方案下的好处，并表明所提出的模型将共同的脆弱性降低到非i.i.d。与其他各种基线相比，非平稳培训分布。

translated by 谷歌翻译

Robustness Implies Generalization via Data-Dependent Generalization Bounds

Kenji Kawaguchi , Zhun Deng , Kyle Luh , Jiaoyang Huang

分类：机器学习 | 人工智能 | 计算机视觉 | (统计)机器学习

2022-06-27

本文证明了鲁棒性意味着通过数据依赖性的概括界限进行概括。结果，鲁棒性和概括被证明是以数据依赖性方式紧密连接的。我们的界限改善了以前的两个方向的界限，以解决自2010年以来几乎没有发展的开放问题。第一个是减少对覆盖码的依赖。第二个是消除对假设空间的依赖性。我们提供了几个示例，包括套索和深度学习的例子，其中我们的界限被证明是可取的。关于现实世界数据和理论模型的实验表明，在各种情况下的近乎指数改进。为了实现这些改进，我们不需要关于未知分布的其他假设。取而代之的是，我们仅包含训练样本的可观察到的可计算特性。一个关键的技术创新是对多项式随机变量的改善浓度，它超出了鲁棒性和泛化。

translated by 谷歌翻译

Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness

Hideaki Iiduka

分类：机器学习

2022-06-27

自适应方法（例如自适应力矩估计（ADAM）及其变体）的收敛性和收敛速率分析已被广泛研究以进行非convex优化。分析基于假设，即预期或经验的平均损失函数是Lipschitz平滑的（即其梯度是Lipschitz的连续），并且学习率取决于Lipschitz连续梯度的Lipschitz常数。同时，对亚当及其变体的数值评估已经澄清说，使用较小的恒定学习速率而不依赖Lipschitz常数和超级参数（$ \ beta_1 $和$ \ beta_2 $）接近一个是有利的，这对于训练深神经网络是有利的。由于计算Lipschitz常数为NP-HARD，因此Lipschitz的平滑度条件是不现实的。本文提供了亚当的理论分析，而没有假设Lipschitz的平滑度条件，以弥合理论和实践之间的差距。主要的贡献是显示理论证据表明，亚当使用较小的学习率和接近一个的超级参数表现良好，而先前的理论结果全部用于接近零的超参数。我们的分析还导致发现亚当在大批量尺寸方面表现良好。此外，我们表明，当亚当使用学习率降低和接近一个的超级参数时，它的表现良好。

translated by 谷歌翻译